perm filename A08.TEX[154,PHY] blob
sn#816452 filedate 1986-05-05 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00002 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \magnification\magstephalf
C00012 ENDMK
C⊗;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
January\or February\or March\or April\or May\or June\or
July\or August\or September\or October\or November\or December\fi
\space\number\day, \number\year}
\baselineskip 14pt
\rm
\line{\sevenrm a08.tex[154,phy] \today\hfill}
\bigskip
\line{\bf The Partition Theorem\hfill}
In a CFL with grammar $G$, if $x↓1x↓2\ldots x↓k=x\Rightarrow y$ by
production $A→w$, $A$~must occur in some~$x↓j=uAv$. Define $y↓j=uwv$,
$y↓i=x↓i$ if $i≠j$; we find $x=x↓1x↓2\ldots x↓k\Rightarrow y↓1y↓2
\ldots y↓k=y$. By induction, if $x↓1x↓2\ldots x↓k=x\aRa y$, then $y$~may
be factored into $y↓1y↓2\ldots y↓k$ with $x↓i\aRa y↓i$.
\proclaim
Theorem. If the first string in a derivation is a concatenation of several
substrings, subsequent strings and the entire derivation may be factored
in a corresponding way.
For example, any $x↓i$ may be changed to~$y↓i$ before any other rewriting.
\medskip
\line{\bf Standardizing Context-Free Grammars\hfill}
\smallskip
\disleft 20pt:(1):
Change each production of the form $A→X↓1X↓2\ldots X↓k$ to the set
$A→X↓1A↓1$,
$A↓1→X↓2A↓2$,
$\ldots\,$,
$A↓{k-1}→X↓kA↓k$,
$A↓k→\epsilon$,
where each~$A↓i$ is a new variable. The set of terminal strings derivable
from~$A$ remains the same, so the language defined remains the same, but
lengths of right hand sides of productions are $≤2$, and the transformations
below preserve this property.
\smallskip
\disleft 20pt:(2):
Determine which variables derive the empty string. For each production
$A→\alpha B\gamma$ where $B\aRa\epsilon$, include the production $A→\alpha\gamma$,
until no further productions are generated; this does not alter the set of
strings derived from each variable. Now omit all productions with~$\epsilon$
as right side. By induction on length of derivation, if
$A\aRa x\in T↑+$ before this transformation, $A\aRa x$ afterward, but no
longer can $A\aRa\epsilon$. The resultant language is the original with
the possible omission of~$\epsilon$, but no right side is empty, so lengths
in derivations are non-decreasing. Transformation~(2) preserves the
postcondition of transformation~(1), as it lengthens no productions.
\smallskip
\disleft 20pt::
To determine which variables derive the empty string, initialize $N$
to~$\emptyset$, then for each production $A→X\in N↑{\ast}$, include~$A$
in~$N$, until no further change occurs. Induction on length of derivations
shows $A\aRa\epsilon$ iff $A\in N$.
\smallskip
\smallskip
\disleft 20pt:(3):
The following transformation requires as a precondition the postcondition
of transformation~(2), that all right sides of productions be nonempty.
Define $P'$ as $P↑+\circ I↓{\Sigma↑{\ast}-V}$, $G'$~as~$G$ with~$P'$
replacing~$P$. Right sides of~$P'$ are not variables, nor empty. Since
$P'\subseteq\,\aRag$, $L↓{G'}\subseteq L↓G$. By induction on length of
derivation, if $X\aRag u\in T↑+$, $X\in\Sigma$, then $X\aRagp u$. For
the induction step, $X\in V$. Let $y=Y↓1Y↓2\ldots Y↓k$ be the first
line in the derivation not in~$V$; $X\togp y=Y↓1Y↓2\ldots Y↓k\aRag
u↓1u↓2\ldots u↓k=u$, where by induction $Y↓i\aRag u↓i$ implies
$Y↓i\aRagp u↓i$, so $X\aRagp u$. The language of~$G'$ is the same as that
of~$G$, but right sides of productions are neither variables nor empty.
The postconditions of~(1) and~(2) are preserved.
\smallskip
\disleft 20pt:(4):
For each $a\in T$, introduce a variable $B↓a$ and a production $B↓a→a$;
replace~$a$ with~$B↓a$ in right sides of length $≥2$. The language is
unchanged, but right sides of length $≥2$ belong to~$V↑{\ast}$. The
postconditions of~(1), (2), and~(3) are preserved, so after (1), (2), (3),
and~(4) all right sides belong to~$T$ or~$V↑2$. This form is called
Chomsky Normal Form (CNF).
\vfill\eject
%\smallskip
\disleft 20pt:(5):
Determine which variables derive some terminal string. Eliminate all variables
that derive no terminal string, and all productions that contain them. This
does not alter the set of terminal derivatives of variables, so the language
is unchanged. If the root symbol is eliminated, the language is empty. Because
no new productions are introduced, the postconditions of (1), (2), (3),
and~(4) are preserved.
\smallskip
\disleft 20pt::
To determine which variables derive a terminal string, initialize $\tau$
to~$\emptyset$. If $A→x\in(\tau\vee T)↑{\ast}$,
insert~$A$ in~$\tau$. When no further insertions are possible, $\tau$~is
the set of variables that derive some terminal string.
\smallskip
\disleft 20pt:(6):
Determine which variables occur in no sentential form (string derived
from~$S$). Omit all such variables, and all productions containing them.
This transformation preserves the postcondition of~(5), because if a
variable~$A$ survives~(6), so does any variable occuring in a derivation
from~$A$ of a terminal string, and so does the entire derivation. After
(5) and~(6), every variable is used in some derivation of a sentence.
Transformation (6) also preserves the postconditions of (1), (2), (3),
and~(4), because it introduces no new productions.
\smallskip
\disleft 20pt::
To determine which variables occur in a sentential form, define relation~$R$
by $XRY$ if $X→\alpha Y\beta$; $S\circ R↑{\ast}$ is the set of variables occurring
in sentential forms.
\smallskip
\disleft 20pt:(7):
If the empty string was eliminated from the language by (2), reintroduce
it with the production $S→\epsilon$.
\smallskip
We conclude:
From any CFG can be constructed another for the same language, in which
all productions are of the forms $V→T$, $V→V↑2$ or $S→\epsilon$, and in
which every variable (except possibly~$S$) may be used in deriving some
sentence. Symbolically,
$$\eqalign{P\subseteq\bigl(V\times(T\cup V\otimes V)\bigr)\cup\{\langle
S,\epsilon\rangle\}\,;\cr
\hbox{If }A\in V,\; S\aRa uAw,\; A\aRa v\,.\cr}$$
\bigskip
\parindent0pt
\copyright 1984 Robert W. Floyd
First draft (not published)
May 2, 1986.
%revised date;
%subsequently revised.
\bye